Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSOC] Add surrogate data generation for significant connectivity estimation #223

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

tsbinns
Copy link
Collaborator

@tsbinns tsbinns commented Aug 5, 2024

Follows from the changes in #220, so the diff will be quite smaller when that's merged.

In brief, adds a new function make_surrogate_connectivity in the datasets module (felt this was an appropriate place).

Since datasets is more populous now I thought it could make sense to clean things up and bring the tests from make_signals_in_freq_bands from the generic tests folder to one within the datasets folder (where the make_surrogate_connectivity tests also are).

Also added an example demonstrating the use of the new function.

Comment on lines +35 to +70
Notes
-----
Surrogate data is generated by randomly shuffling the order of epochs, independently
for each channel. This destroys the covariance of the data, such that connectivity
estimates should reflect the null hypothesis of no genuine connectivity between
signals (e.g., only interactions due to background noise)
:footcite:`PellegriniEtAl2023`.

For the surrogate data to properly reflect a null hypothesis, the data which is
shuffled **must not** have a temporal structure that is consistent across epochs.
Examples of this data include evoked potentials, where a stimulus is presented or an
action performed at a set time during each epoch. Such data should not be used for
generating surrogates, as even after shuffling the epochs, it will still show a high
degree of residual connectivity between channels. As a result, connectivity
estimates from your surrogate data will capture genuine interactions, instead of the
desired background noise. Treating these estimates as a null hypothesis will
increase the likelihood of a type II (false negative) error, i.e., that there is no
significant connectivity in your data.

Appropriate data for generating surrogates includes data from a resting state,
inter-trial period, or similar. Here, a strong temporal consistency across epochs is
not assumed, reducing the chances that connectivity information of interest is
captured in your surrogate connectivity estimates.

In situations where you want to assess whether evoked data has significant
connectivity, you can generate your surrogate connectivity estimates from non-evoked
data (e.g., rest data, inter-trial data) and compare this to your true connectivity
estimates from the evoked data.

Regardless of whether you are working with evoked or non-evoked data, **you should
always compare true and surrogate connectivity estimates from epochs of the same
duration**. This will ensure that spectral information is captured with the same
accuracy in both sets of connectivity estimates. Ideally, **you should also compare
true and surrogate connectivity estimates from the same number of epochs** to avoid
biases from noise (fewer epochs gives noisier estimates) or finite sample sizes
(e.g., in coherency, phase-locking value, etc... :footcite:`VinckEtAl2010`).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very long notes section, but like we discussed, there are important considerations as to what data surrogates should be generated from.

@tsbinns
Copy link
Collaborator Author

tsbinns commented Aug 5, 2024

Hmm, building docs is failing here on the new example (fine for me locally), but doesn't give a reason beyond

make: *** [Makefile:56: html] Killed
Exited with code exit status 2

Could it be that it doesn't like multiprocessing? I'm using n_jobs=-1 to speed up connectivity computation on the multiple shuffles.

@larsoner
Copy link
Member

larsoner commented Aug 5, 2024

Often that's a sign that you've used too much memory, and decreasing the number of jobs can help

@tsbinns
Copy link
Collaborator Author

tsbinns commented Aug 6, 2024

Okay, even using 1/4 the CPU count ends up using too much memory (find it surprising with 36 cores and 70 GB). Just reverting to a single job and taking the hit on runtime.

@tsbinns
Copy link
Collaborator Author

tsbinns commented Aug 7, 2024

Switching CI back to upstream since mne-tools/mne-python#12747 was merged.

@tsbinns
Copy link
Collaborator Author

tsbinns commented Aug 20, 2024

Wanted to make sure everything was green before submitting final GSoC stuff. Newly failing tests addressed in #228

@larsoner
Copy link
Member

@wmvanvliet want to look and merge if you're happy?

Copy link
Member

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks so great! thanks for all the hard work @tsbinns

https://output.circle-artifacts.com/output/job/bafe351e-3009-46ea-92ce-863de6768196/artifacts/0/dev/auto_examples/surrogate_connectivity.html#sphx-glr-auto-examples-surrogate-connectivity-py

Few comments:

  1. would it be great to actually print the pvalue too potentially besides just writing that it's <0.05?
  2. When comparing significance across the frequency band, the major difference appears in the alpha band, but you test the beta band. Perhaps it's worth mentioning that you test the beta band specifically since the alpha band is clearly significantly different (and due to simplicity for running the example), but in practice one can test both. You do allude to this point in multiple comparisons discussion, but that's one question I had when reading through that section

@tsbinns
Copy link
Collaborator Author

tsbinns commented Nov 11, 2024

Thanks for the review @adam2392! Very good points, will have a look at implementing them by the end of the week. Cheers!

@tsbinns
Copy link
Collaborator Author

tsbinns commented Nov 12, 2024

Comments should be addressed now @adam2392.

Copy link
Member

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants